Load Shedding using Window Aggregation Queries on Data Streams
نویسندگان
چکیده
The processes of extracting knowledge structures for continuous, rapid records are known as the Data Stream Mining. The main issue in stream mining is handling streams of elements delivered rapidly which makes it infeasible to store everything in active storage. To overcome this problem of handling voluminous data we exposed a novel load shedding system using window based aggregate function of the data stream in which we accept those tuples in the stream that meet a criterion. Accepted tuples are conceded to another process as a stream, while further tuples are dropped. This proposed model conceivably segregates the data input stream into windows and probabilistically decides which tuple to drop based on the window function. The best window aggregate function used for dropping tuples is identified with the three prediction models used in data mining they are Decision Tree, Naïve Bayes and Logistic Regression. The result shows that the cumulative distance and density rank functions outperforms the remaining methods. Distinct to prior methods, our method preserves uniformity of windows all over a query plan, and constantly distributes subsets of the original query responds with insignificant denial in the excellence of the consequence.
منابع مشابه
Load Shedding in Data Stream Systems
Systems for processing continuous monitoring queries over data streams must be adaptive because data streams are often bursty and data characteristics may vary over time. In this chapter, we focus on one particular type of adaptivity: the ability to gracefully degrade performance via "load shedding" (dropping unprocessed tuples to reduce system load) when the demands placed on the system cannot...
متن کاملA Review of Window Query Processing for Data Streams
In recent years, progress in hardware technology has resulted in the possibility of monitoring many events in real time. The volume of incoming data may be so large, that monitoring all individual data might be intractable. Revisiting any particular record can also be impossible in this environment. Therefore, many database schemes, such as aggregation, join, frequent pattern mining, and indexi...
متن کاملImproving the accuracy of continuous aggregates and mining queries on data streams under load shedding
Random samples are common in data streams applications due to limitations in data sources and transmission lines, or to load-shedding policies. Here we introduce a formal error model and show that, besides providing accurate estimates, it improves query answer accuracy by exploiting past statistics. The method is general, robust in the presence of concept drift, and minimises uncertainties due ...
متن کاملWindow-aware Load Shedding for Data Streams
Data stream management systems may be subject to higher input rates than their resources can handle. In this case, results get delayed and Quality of Service (QoS) at system outputs may fall below acceptable levels. Load shedding addresses this problem by allowing data loss in exchange for reduced latency. Drop operators are placed at carefully chosen points in a query plan, in order to relieve...
متن کاملWindow Queries over Data
An abstract of the dissertation of Jin Li for the Doctor of Philosophy in Computer Science presented October 17, 2008. Title: Window Queries over Data Streams Evaluating queries over data streams has become an appealing way to support various stream-processing applications. Window queries are commonly used in many stream applications. In a window query, certain query operators, especially block...
متن کامل